From Treebanks to Tree-Adjoining Grammars

نویسندگان

  • Fei Xia
  • Martha Palmer
چکیده

Grammars are valuable resources for natural language processing. A large-scale grammar may incorporate a vast amount of information on morphology, syntax, and semantics. Traditionally, grammars are built manually. Hand-crafted grammars often contain rich information, but require tremendous human effort to build and maintain. As large-scale treebanks become available in the last decade, there has been much work on extracting grammars automatically from treebanks. Such grammars are called treebank grammars. Many of the previous work on grammar extraction such as (Shirai, Tokunaga, and Tanaka, 1995; Charniak, 1996; Krotov et al., 1998) generate context-free grammars (CFGs). In this chapter, we present a system, LexTract, which generates both CFGs and lexicalized tree adjoining grammars (LTAGs). Extracting LTAGs is more complicated than extracting CFGs because of the differences between LTAGs and CFGs. First, the primitive elements of an LTAG are lexicalized tree structures (called elementary trees), rather than context-free rules. Therefore, an LTAG extraction algorithm needs to examine a larger portion of a phrase structure to build an elementary tree. Second, because the adjoining operation in LTAG allows an elementary tree to be inserted within another elementary tree, an elementary tree is often formed by gluing together several disconnected parts of a phrase structure. Third, unlike in CFGs, parse trees (also known as derived trees in the LTAG formalism) and derivation trees (which describe how elementary trees are combined to form parse trees) are distinct in the LTAG formalism in the sense that a parse tree can Treebank grammars may contain less information (e.g., feature structures associated with non-terminal nodes) than some hand-crafted grammars, but they are sufficient for some NLP tasks such as Supertagging and parsing as described in Section 4.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Extraction of Tree Adjoining Grammars from a Treebank for Vietnamese

In this paper, we present a system that automatically extracts lexicalized tree adjoining grammars (LTAG) from treebanks. We first discuss in detail extraction algorithms and compare them to previous works. We then report the first LTAG extraction result for Vietnamese, using a recently released Vietnamese treebank. The implementation of an open source and language independent system for automa...

متن کامل

Coping With Problems In Grammars Automatically Extracted From Treebanks

We report in this paper on an experiment on automatic extraction of a Tree Adjoining Grammar from the WSJ corpus of the Penn Treebank. We use an automatic tool developed by (Xia, 2001) properly adapted to our particular need. Rather than addressing general aspects of the automatic extraction we focus on the problems we have found to extract a linguistically (and computationally) sound grammar a...

متن کامل

Encoding Frequency Information in Lexicalized Grammars

We address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework. We consider systematically a number of alternative probabilistic frameworks, evaluating their adequacy from both a theoretical and empirical perspective using data from existing large treebanks. We also propose three orthogon...

متن کامل

PreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars

Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...

متن کامل

Comparing Lexicalized Treebank Grammars Extracted From Chinese, Korean, And English Corpora

In this paper, we present a method for comparing Lexicalized Tree Adjoining Grammars extracted from annotated corpora for three languages: English, Chinese and Korean. This method makes it possible to do a quantitative comparison between the syntactic structures of each language, thereby providing a way of testing the Universal Grammar Hypothesis, the foundation of modern linguistic theories. 1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007